Motivation

This project investigates the relationship between county-level age demographics and economic outcomes, specifically focusing on debt-to-income ratios. As counties across the U.S. experience demographic shifts, particularly the aging of the population, local governments face fiscal pressures tied to rising expenditures on retirement benefits and healthcare, coupled with reduced tax revenue from an aging workforce. We hypothesize that a higher proportion of older residents correlates with and potentially drives higher county-level debt-to-income ratios.

This analysis is motivated by the need to understand how demographic trends influence fiscal health at the local level. Counties with larger elderly populations may prioritize policies that cater to retirees—such as increased spending on pensions—potentially exacerbating debt-to-income ratios. Conversely, counties with younger populations may exhibit different fiscal dynamics, such as higher workforce participation and tax revenues.

The primary research question question of this research: Does the age distribution of a county’s population have a causal impact on debt-to-income ratios, potentially driven by shifts in local government spending and tax revenues?

In this research, we assume that tax revenues can serve as a proxy for local government policies. This assumption is based on the idea that tax revenue levels reflect fiscal priorities, including how resources are allocated and expenditures are balanced to address demographic shifts and economic demands.

Methods

We begin by preparing the dataset for econometric analysis. The dataset includes various demographic and economic variables at the county level across multiple years. The dataset is cleaned and transformed to make it suitable for panel data analysis.

This is a descriptive regression analysis where we use ElasticNet regression to quantify the associations and perform predictive analysis. The motivation behind this study is to understand how different factors influence the debt-to-income ratio at the county level and use the model to predict future values of this ratio. In addition to this, we use the variables selected in the elasticnet model to create a linear model on the panel data for potential causal inference.

To better understand this relationship from the standpoint of causal inference, we also applied a fixed effects panel regression model. This model controls for individual heterogeneity at the county level, allowing for more precise estimation of relationships between the variables and the target debt-to-income ratio. The motivation here is to explore how these factors change over time within each county, thereby accounting for unobserved time-invariant characteristics.

Panel Data Structure

The dataset is first transformed into a panel format by creating a unique identifier for each county based on its state and county name. We then ensure that the data is structured correctly for panel analysis, using UniqueID and Year as the panel indices. After that, we standardize all numeric variables within each county (identified by UniqueID) to account for differing scales of the variables.

Predictive Modeling with Elasticnet

We selected the Annual_Debt_to_income_ratio_low variable as the target for prediction. To predict this variable, we use ElasticNet regression, which is a regularization technique that combines both Lasso (L1) and Ridge (L2) penalties. ElasticNet is particularly useful when there are many correlated predictors.

The data is split into a training set (80%) and a test set (20%). We then fit the ElasticNet model using cross-validation to find the optimal regularization parameter, lambda. The cross-validation results indicate the best value of lambda to minimize prediction error.

Annual_Debt_to_income_ratio_low was chosen as the outcome variable because it serves as a key indicator of household financial health at the county level. This metric reflects the balance between debt obligations and income, providing insight into the economic well-being of a county’s residents on average. Predicting this ratio has particular import to several groups. For instance, policymakers can use it as a tool for identifying financial stress in order to specify policy prescriptions. Similarly local governments can make use of these predictions to more efficiently allocate resources. Business executives may find the debt-to-income ratio important when it comes to identifying areas that have potential. This can affect decisions about investments, credit offerings, and business expansion opportunities.

Results

Cross-Validation and Model Fit

The ElasticNet model is fitted on the training data, and we obtain the optimal lambda value from cross-validation. The lambda that minimizes the mean squared error is:

Optimal lambda (from cross-validation): 0.0002784226

The model’s performance is evaluated on the test set by calculating the Mean Squared Error (MSE):

MSE on Test Set: 0.813388

This indicates that the model has a relatively low error, suggesting a good fit for the data.

Coefficients

The model’s coefficients represent the relationship between each predictor and the target variable, Annual_Debt_to_income_ratio_low. Some key coefficients are:

  • Positive coefficients: Variables like Compensation and Taxes on production and imports have positive coefficients, indicating that as these variables increase, the Annual_Debt_to_income_ratio_low tends to increase as well.
  • Negative coefficients: Variables such as Real GDP show negative coefficients, meaning that as GDP increases, the debt-to-income ratio tends to decrease.

The coefficients help quantify the magnitude and direction of relationships between the predictors and the response variable.

Performance Evaluation

The performance of the ElasticNet model is assessed through cross-validation and MSE. The optimal lambda value helps balance model complexity and predictive accuracy. The low MSE indicates that the model is able to predict the debt-to-income ratio with reasonable accuracy.

Interpretation

Predictive Modeling Insights

The primary focus of this analysis was predictive modeling, where we used ElasticNet to predict the Annual_Debt_to_income_ratio_low based on a variety of socioeconomic and fiscal variables. The model’s performance, reflected in the MSE, suggests that the selected variables capture important information for predicting the target variable.

Coefficient Interpretation

  • The positive coefficients indicate variables that contribute to increasing the debt-to-income ratio, such as higher compensation and taxes on production and imports.
  • Conversely, negative coefficients like that for GDP suggest that stronger economic performance (as measured by GDP) is associated with lower debt-to-income ratios, which could be indicative of improved fiscal health.

Limitations and Further Steps

While the predictive model provides useful insights, there are limitations to its ability to capture complex, non-linear relationships between variables. Future steps could include exploring more advanced machine learning techniques or using the model for more granular forecasting. Additionally, understanding the potential sources of bias, especially in terms of omitted variables or unmeasured confounders, is important for making the results more reliable.

Additional Analysis: Fixed Effects Panel Regression

Methods

The process involved several steps: 1. Variable Selection: Based on prior analysis, we selected a subset of variables that were found to have significant coefficients in the ElasticNet model. These variables are likely to have the most influence on the debt-to-income ratio, and using elasticnet to select them helps reduce the likelihood of our analysis being influenced by omitted variables bias.

  1. Data Preparation: A new dataset was created, which includes only the UniqueID, Year, Annual_Debt_to_income_ratio_low, and the selected variables. This dataset was then arranged by county (UniqueID) and year (Year).

  2. Fixed Effects Model: A fixed effects panel regression was run using the plm package in R, with Annual_Debt_to_income_ratio_low as the dependent variable and all selected variables as independent variables. The model controls for individual (county) effects by including a “within” specification.

Results

While the model was able to be computed, it was riddled with multicolinearity among the regressors which made its model summary uncomputable. Removing the multicolinearly would require us to remove most of the regressors in our model, which would defeat the purpose of using elasticnet as a method of selecting regressors such to minimize risk of ommitted variables bias. We were thus unable to utilize the predictive model in service of causal inference. However, this did provide us with additional evidence that, given our chosen candidate variables of interest, that the question at hand was better suited to predictive analysis using elasticnet rather than causal inference. We therefore include the above discussion of the regression.

Limitations and Further Steps

The fixed effects model accounts for unobserved county-level heterogeneity but does not address potential endogeneity or omitted variable bias that could still affect the estimates. Further analysis could include instrumental variable approaches or explore other econometric techniques to deal with these concerns. Additionally, the model assumes linear relationships between predictors and the outcome, which may not fully capture the complexity of the underlying dynamics.

Conclusion

This analysis provides a predictive framework for understanding the relationship between various socioeconomic and fiscal factors and the Annual_Debt_to_income_ratio_low. ElasticNet regression has been employed to quantify these relationships and predict the debt-to-income ratio. The results show meaningful associations between predictors and the target variable, and the model has demonstrated good predictive performance.

Future work could involve exploring causal inference techniques or enhancing the model with more complex machine learning algorithms to further improve prediction accuracy.

AI Disclosure

Both authors of this report made use of ChatGPT during the research and writing process. No other AI services were used. There were three main use cases for this AI. The first and primary use was for coding assistance in R. This includes translating descriptions of desired data transformations into code chunks, analyzing written code for errors, and suggesting functions for specific use cases. ChatGPT was particularly helpful in generating and modifying visual elements and figures for the report. For example, if the title of a graphic was hard to read ChatGPT could provide a list of potential improvements that could be tried or discarded. The second use of AI was in its capacity to generate writing prompts. Instructions and data were fed into the AI in order to generate ideas of what to write about as well as structure to the writing. Notably, there was intentional effort to not directly copy and paste any ChatGPT output. This applies to code as well; the majority of incorperated information generated by ChatGPT was used to inspire novel approaches or troubleshoot problems, rather than as direct inputs in the final product. A notable exception is a minority is some code snippets that were directly pulled. The final use of AI in the project was in the form of proofreading. Select sections of text were parsed by the language model and asked for spelling and grammar checks.

Visualizations

Age Proportion Distribution Animation

Age Proportion Distribution Animation
Age Proportion Distribution Animation

Aggregate State Tax Revenue by Year

Tax Revenue Over Time
Tax Revenue Over Time

Debt to Income Distribution Animation

Debt to Income Distribution Animation
Debt to Income Distribution Animation

Elasticnet Output

Elasticnet Output
Elasticnet Output

Population Heatmap Animation

Population Heatmap Animation
Population Heatmap Animation

Real GDP Histogram (Log)

Real GDP Histogram
Real GDP Histogram